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Abstract — This article presents a new numerical abstract 
domain for static analysis by abstract interpretation. It extends 
a former numerical abstract domain based on Difference-Bound 
Matrices and allows us to represent invariants of the form 
(±x ± y < c), where x and y are program variables and c 
is a real constant. 

We focus on giving an efficient representation based on 
Difference-Bound Matrices — 0{n 2 ) memory cost, where n is 
the number of variables — and graph-based algorithms for all 
common abstract operators — 0(n A ) time cost. This includes a 
normal form algorithm to test equivalence of representation and 
a widening operator to compute least fixpoint approximations. 

Index Terms — abstract interpretation, abstract domains, linear 
invariants, safety analysis, static analysis tools. 

I. Introduction 

This article presents practical algorithms to represent and 
manipulate invariants of the form (±x ± y < c), where x 
and y are numerical variables and c is a numeric constant. It 
extends the analysis we previously proposed in our PADO-II 
article [1]. Sets described by such invariants are special kind of 
polyhedra called octagons because they feature at most eight 
edges in dimension 2 (Figure |2). Using abstract interpretation, 
this allows discovering automatically common errors, such as 
division by zero, out-of-bound array access or deadlock, and 
more generally to prove safety properties for programs. 

Our method works well for reals and rationals. Integer 
variables can be assumed, in the analysis, to be real in order 
to find approximate but safe invariants. 

Example. The very simple program described in Figure 
[U simulates M one-dimensional random walks of m steps 
and stores the hits in the array tab. Assertions in curly 
braces are discovered automatically by a simple static analysis 
using our octagonal abstract domain. Thanks to the invariants 
discovered, we have the guarantee that the program does not 
perform out-of-bound array access at lines 2 and 10. The 
difficult point in this example is the fact that the bounds of 
the array tab are not known at the time of the analysis; thus, 
they must be treated symbolically. 

For the sake of brevity, we omit proofs of theorems in this 
article. The complete proof for all theorems can be found in 
the author's Master thesis [2]. 

II. Previous Work 

A. Numerical Abstract Domains. 

Static analysis has developed a successful methodology, 
based on the abstract interpretation framework — see Cousot 



1 int tab[—m . . . to]; 

2 for i = —m to m tab[i] = 0; {— to < i < to} 

3 for j = 1 to M do 

4 int a = 0; 

5 for i = 1 to m 

6 {l<i<m;-i + l<a<i-l} 

7 if rand(2) = 

8 then a = a+ 1; {— i + l<a<i} 

9 else a = a — 1; {— i < a < i — 1} 

I tab[a] = tab[a] + 1; { — m < a < m } 

II done; 



Fig. 1 . Simulation of a random walk. The assertions in curly brackets { . . . } 
are discovered automatically and prove that this program does not perform 
index out of bound error when accessing the array tab. 

and Cousot's POPL'77 paper [3] — to build analyzers that 
discover invariants automatically: all we need is an abstract 
domain, which is a practical representation of the invariants 
we want to study, together with a fixed set of operators and 
transfer functions (union, intersection, widening, assignment, 
guard, etc.) as described in Cousot and Cousot's POPL'79 
article [4]. 

There exists many numerical abstract domains. The most 
used are the lattice of intervals (described in Cousot and 
Cousot's ISOP'76 article [5]) and the lattice of polyhedra 
(described in Cousot and Halbwachs's POPL'78 article [6]). 
They represent, respectively, invariants of the form (v G 
[ci; C2]) and (ait>i + • • • + a n v n < c), where v,v\, . . . ,v n 
are program variables and c, c\, C2, ot\, . . . , a n are constants. 
Whereas the interval analysis is very efficient — linear memory 
and time cost — but not very precise, the polyhedron analysis is 
much more precise (Figure |2]i but has a huge memory cost — in 
practice, it is exponential in the number of variables. 

Remark that the correctness of the program in Figure Q] 
depends on the discovery of invariants of the form (a G 
[—to, to]) where to must not be treated as a constant, but 
as a variable — its value is not known at analysis time. Thus, 
this example is beyond the scope of interval analysis. It can 
be solved, of course, using polyhedron analysis. 

B. Difference-Bound Matrices. 

Several satisfiability algorithms for set of constraints involv- 
ing only two variables per constraint have been proposed in 
order to solve Constraint Logic Programming ( CLP) problems. 
Pratt analyses, in [7], the simple case of constraints of the 
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form [x — y < c) and (±x < c) which he called separation 
theory. Shostak then extends, in [8], this to a loop residue 
algorithm for the case (ax + Py < c). However, the algorithm 
is complete only for reals, not for integers. Recently, Harvey 
and Stuckey proposed, in their ACSC'97 article [9], a complete 
algorithm, inspired from [8], for integer constraints of the form 
(±x ±y < c). 

Unlike CLP, when analyzing programs, we are not only 
interested in testing the satisfiability of constraint sets, we also 
need to manipulate them and apply operators that mimic the 
one used to define the semantics of programs (assignments, 
tests, control flow junctions, loops, etc.). 

The mo del- checking community has developed a practical 
representation, called Difference-Bound Matrices (DBMs), for 
constraints of the form (x— y < c) and (±x < c), together with 
many operators, in order to model-check timed automata (see 
Yovine's ES'98 article [10] and Larsen, Larsson, Pettersson, 
and Yi's RTSS'97 article [11]). These operators are tied to 
model checking and do not meet the abstract interpretation 
needs. This problem was addressed in our PADO-II article [1] 
and in Shaham, Kolodner, and Sagiv's CC2000 article [12] 
which propose abstract domains based on DBMs, featuring 
widenings and transfer functions adapted to real-live program- 
ming languages. All these works are based on the concept of 
shortest-path closure already present in Pratt's article [7] as 
the base of the satisfiability algorithm for constraints of the 
form (x — y< c). The closure also leads to a normal form that 
allows easy equality and inclusion testing. Good understanding 
of the interactions between closure and the other operators is 
needed to ensure the best precision possible and termination of 
the analysis. These interactions are described in our PADO-II 
article [1]. 

Again, proof of the correctness of the program in Figure 
[T] is beyond the scope of the DBM-based abstract domains 
presented in [1], [12] because the invariant (—a — m < 0) we 
need does not match (x — y < c). 

C. Our Contribution. 

Our goal is to propose a numerical abstract domain that is 
between, in term of expressiveness and cost, the interval and 
the polyhedron domains. The set of invariants we discover 
can be seen as special cases of linear inequalities; but the 
underlying algorithmic is very different from the one used in 
the polyhedron domain [6], and much more efficient. 

In this article, we show that DBMs can be extended to 
describe invariants of the form (±x ± y < c). We build a 
new numerical abstract domain, called the octagon abstract 
domain, extending the abstract domain we presented in our 
PADO-II article [1] and detail algorithms implementing all 
operators needed for abstract interpretation. Most algorithms 
are adapted from [1] but some are much more complex. In 
particular, the closure algorithm is replaced by a strong closure 
algorithm. 

It is very important to understand that an abstract domain 
is only a brick in the design of a static analyzer. For the 
sake of simplicity, this paper presents an application of our 
domain on a simple forward analysis of a toy programming 
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Fig. 2. A set of points (a), and its best approximation in the interval (b), 
polyhedron (c), and octagon (d) abstract domains. 

language. However, one could imagine to plug this domain 
in various analyses, such as Bourdoncle's Syntox analyzer 
[13], Deutsch's pointer analysis [14], Dor, Rodeh, and Sagiv's 
string cleanness checking [15], etc. 

Section III recalls the DBM representation for potential 
constraints (x — y < c). Section IV explains how DBMs 
can be used to represent a wider range of constraints: interval 
constraints (±a; < c), and sum constraints (±x ±y < c). We 
then stick to this last extension, as it is the core contribution of 
this article, and discuss in Section V about normal form and 
in Section VI about operators and transfer functions. Section 

VII builds two lattice structures using these operators. Section 

VIII presents some practical results and gives some ideas for 
improvement. 

III. Difference-Bound Matrices 

In this section, we recall some definitions and simple facts 
about Difference-Bound Matrices (DBMs) and their use in 
order to represent sets of invariants of the form (x — y < c). 
DBMs are described in [11], [10] from a model-checking point 
of view and in [1] for abstract interpretation use. 

Let V = {vq, . . . , vtv_i} be a finite set of variables with 
value in a numerical set I (which can be Z, Q or R). We extend 
I to I by adding the +oo element; the standard operations <, 
=, +, min and max are extended to I as usual. 

A. Potential Constraints, DBMs. 

A potential constraint over V is a constraint of the form 
(v.i — Vj < c), with Vi,Vj <G V and c € I. Let C be a set 
of potential constraint over V. We suppose, without loss of 
generality, that there do not exist two constraints (vi — Vj < c) 
and (vi — Vj < d) in C with c ^ d. Then, C can be represented 
uniquely by a N x N matrix m with elements in I : 
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Fig. 3. A DBM (a), its potential graph (b) and its V-domain (c). 
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if (vj — Vi < c) G C, 
elsewhere . 



m is called a Difference-Bound Matrix (DBM). 

B. Potential Graph. 

It is convenient to consider m as the adjacency matrix of a 
weighted graph Q(m) = {V, A, w}, called its potential graph, 
and defined by: 



ACVxV, 



w G A h-> I, 



A 



A = {(vi,Vj) I m,j < +00}, w((vi,Vj)) 



A 



m. 



We will denote by (£1, . . . , £&) a finite set of nodes repre- 
senting a /jof/j from node v^ to node in Q(m). A cyc/e is 
a path such that £1 = £&. 

C. S3 Oder. 

The < order on I induces a point-wise partial order ^ on 
the set of DBMs: 



m ^ n 



A 



V£,i, my < rij 



The corresponding equality relation is simply the matrix 
equality =. 

D. V-domain. 

Given a DBM m, the subset of V 1— > I (which will be 
often assimilated to a subset of 1^) verifying the constraints 
V£, j, vj —Vi< my will be denoted by 2?(m) and called m's 
V-domain: 

£>(m) = {(s , . . . , sjv-i) G l N I V£, j, s 3 - s t < m^} . 

By extension, we will call V-domain any subset of V n I 
which is the V-domain of some DBM. 

Remark 1: We have m n V{m) C D(n), but 

the converse is false. As a consequence, representation of V- 
domains is not unique and we can have V(m) = P(n) but 
m ^ n (Figure |4j. 
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Fig. 4. Two different DBMs with the same V-domain. Remark that (a) and 
(b) are not even comparable with respect to ^. 



IV. Extending Difference-Bound Matrices 

Discovering invariants of the single potential form (x — 
y < c) is not very interesting; however DBMs can be used 
to represent broader constraint forms. In this section, we first 
present briefly how to add interval constraints (±x < c). This 
extension is not new: [11], [1] use it instead of pure DBM. 
We then present our new extension allowing representation of 
the more general constraints (±x ±y < c). 

A. Representing intervals. 

Given a finite set of variables V° = {uo, • • • , i>jv-i}, in 
order to represent constraints of the form (u, — Vj < c) and 
(±i>j < c), we simply add to V° a special variable, named 
0, which is supposed to be always equal to 0. Constraints of 
the form (vi < c) and (vj > d) can then be rewritten as 
(Vi — < c) and (0 — Vj < — d), which are indeed potential 
constraints over the set V = {0, vo, . . . , w^v-i}- 

We will use a superscript to denote that a DBM over V 
represents a set of extended constraints over V°. Given such a 
DBM m°, we will not be interested in its V-domain, P(m°), 
which is a subset of V 1— * I, but in its V Q -domain, denoted by 
D°(m°) and defined by: 

Tf 1 (rrfi\ - / Oo,---,SjV-l) G l N \ 

We will call V° -domain any subset of V° 1— > I which is 
the V°-domain of some DBM m°. As before, m° ^1 n° ==^> 
P°(m°) C V°(nP), but the converse is false. 

B. Representing sums. 

We suppose that V + = {vq, ■ ■ ■ ,i>Ar_i} is a finite set of 
variables. The goal of this article is to present a new DBM 
extension adapted to represent constraints of the form (±1^ ± 
Vj < c), with Vi, Vj G V + and cel. 

In order to do this, we consider that each variable vt in V + 
comes in two flavors: a positive form vf and a negative form 
v~ . We introduce the set V = { Vq , Vq , . . . , v^ I _ 1 , vj ! _ 1 } 
and consider DBMs over V. Within a potential constraint, a 
positive variable vf will be interpreted as +Vi, and a negative 
variable v^ as —Vf, thus it is possible to represent (vi+Vj < c) 



CV°H4l 



by (vf — vj < c). More generally, any set of constraints of the 
form (±Vi ± Vj < c), with Vi, Vj G V + can be represented by 
a DBM over V, following the translation described in Figure 
® 

Remark 2: We do not need to add a special variable to 
represent interval constraints as we did before. Constraints of 
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constraint over V + 


constraint(s) over V 


Vi -Vj<c (i ^ j) 
Vi + vj < c (i ^ j) 
-Vi - Vj <c (i=£ j) 

Vi < c 

Vi > c 


v f ~ v f <c i v 7 -v~<c 
vj — Vj < c, v J — v i < c 
vj - vf < c, vj -vf <c 

v f — ^ 2c 

«r - «+ < -2 



Fig. 5. Translation between extended constraints over V + and potential 
constraints over V. 



the form (vi < c) and (vi > c) can be represented as (vf — 
vj < 2c) and (vj - vf < -2c). 



C. Index Notation. 

We will use a + superscript to denote that a DBM over V 
represents a set of extended constraints over V + . Such a DBM 
m + is a 2iV x 2N matrix with the following convention: a 
row or column index of the form 2i, i < N corresponds to 
the variable vf and an index of the form 2i + 1, i < N 
corresponds to the variable vj. 

We introduce the • i— > ' operator on indices defined by i = 
i © 1 — where © is the bit-wise exclusive or operator — so that, 
if i corresponds to vf, then i corresponds to vj and if % 
corresponds to Vj , then i corresponds to vj. 

D. Coherence. 

Figure shows that some constraints over V + can be 
represented by different potential constraints over V. A DBM 
m + will be said to be coherent if two potential constraints 
over V corresponding to the same constraint over V + are either 
both represented in m + , or both absent. Thanks to the • h- > 7 
operator we introduced, coherence can be easily characterized: 

Theorem 1: m + is coherent <^=> Vi, j. vaf, = mf . 

□ 

In the following, DBMs with a + superscript will be 
assumed to be coherent. 



E. V + -domain. 

As for the simple interval extension, the V-domain of a 
DBM m + is not of interest: we need to get back in V + i— > 
I and take into account the fact that variables in V are not 
independent but related by vf = —vj . Thus, we define the 
V + -domain of m + and denote by V + (m + ) the set: 

P+(m+) = / ( s o,---,s W -i) el w | 1 
\ (s , -s , ...,sn-i,-sn-i) 6 X>(m+) j 

We will call octagon any subset of V + I which is the 
V + -domain of some coherent DBM m + . As before, m + ^ 
n + ==^> P + (m + ) C P + (n + ), but the converse is false. 




Fig. 6. A potential graph Q(m + ) in Z with no strictly negative cycle (a) and 
the corresponding V + -domain (b) X>+(m+) = {(§, 5)} which is empty in 
T?. 

V. Emptiness Test and Normal Forms 

We saw in Figure [4] that two different DBMs can have the 
same V-domain. Fortunately, there exists a normal form for 
DBMs representing non-empty octagons. 

In this section, we first recall the normal form for classical 
DBMs m, and then show how it can be adapted to DBMs m + 
representing non-empty octagons. Unfortunately, our adapta- 
tion does not work very well with integers. 

The potential graph interpretation of DBMs will be very 
helpful to understand the algorithms presented. 

A. Emptiness Test. 

The following graph-oriented theorem allows us to perform 
emptiness testing for V-domains, V°-domains and octagons: 

Theorem 2: 

1) P(m) = G(na) has a cycle with a strictly 
negative weight. 

2) D(m°) = ^> 2?°(m°) = 0. 

3) If I ^ Z, then D(m+) = <S=> P+(m+) = 0. 

If I = Z, then £>(m+) = => P+(m+) = 0, but the 
converse is false (Figure [6]). 

□ 

If I V= Z, in order to check whether the V + -domain of 
a DBM m + is empty, we simply have to check for cycles 
with a strictly negative weight in Q(m + ) using, for example, 
the well-known Bellman-Ford algorithm which runs in 0(N 3 ) 
time and is described in Cormen, Leiserson and Rivest's 
classical algorithmic textbook [16, §25.3]. 

Figure [6] gives an example where our algorithm fails when 
dealing with integers. Indeed, we have £>(m + ) = {(3 + x,3 — 
x, 3 + y, 3 — y) \ Vx, y £ Z} which is not empty, but all these 
solutions over {vf, vj, vf, vj} correspond to the singleton 
{(3/2,3/2)} when we get back to {vo,vi}, which is not 
an acceptable solution in Z 2 , so P + (m + ) should be empty. 
The problem is that a DBM m + with coefficients in Z can 
represent constraints that use not only integers, but also half- 
integers constants — such as vi > 3/2 in Figure [6] 

B. Closure. 

Given a DBM m, the V-domain of which is not empty, 
Q(m) has no strictly negative cycle, so its shortest-path 
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A 

m = m, 

m fc+1 = C k (m k ) Vfc, 0<fc<iV, 

* A 

m = mjv, 



where C k is defined, Vfc, by: 



= 0, 



[C fc (n)], 



min(ny , n jfc + n kj ) Mi ^ j 



Fig. 7. Closure algorithm derived from the Floyd-Warshall shortest-path 
algorithm. 



closure — or simply closure — m* is well-defined by: 



0. 



mm 

1<M 
(i=i 1 ,i 2 ,...,i M =j) 



M-l 

£: 

k=l 



if i ^ j 



The idea of closure relies on the fact that, if (i = 
ii, t2, ■ ■ ■ , im = j) is a path from u,; to Vj, then the constraint 



Vi < X^feli 1 m ifcifc+i can be derived from m by adding 



v 



the potential constraints «i fc 
M — 1. This is an implicit potential constraint as it does not 
appear directly in m. In the closure, we replace each potential 
constraint vj — Vj < by the tightest implicit constraint we 
can find by summation over paths of G(m) if i ^ j, or by 
if i = j (0 is indeed the smallest value taken by Vj — v.i). 
We have the following theorem: 



Theorem 3: 
1) m = rr 



\/i,j,k, mjj < m. lk + m kj and 



m.u = (Local Definition). 

2) Mi,j, if m?- 7^ +oo, then 3(s , . . . , sw-i) 6 £>(m) 
such that Sj — Sj = mjy (Saturation). 

3) m* = inf^{n | P(n) = 2?(m)} (Normal Form). 

□ 

Theorem [3] 2 proves that the closure is indeed a normal 
form. Theorem [3] 1 leads to a closure algorithm inspired by 
the Floyd-Warshall shortest-path algorithm. This algorithm is 
described in Figure [7] and runs in 0(N 3 ) time. Theorem |3]2 
is crucial to analyze precision of some operators (such as 
projection and union). 

Remark 3: The closure is also a normal form for DBMs 
representing non-empty V°-domains: 
(m )* = inf^n | P°(n°) = £>°(m )}. 

C. Strong Closure. 

We now focus on finding a normal form for DBMs rep- 
resenting non-empty octagons. The solution presented above 
does not work because two different DBMs can have the 
same V + -domain but different V-domains, and so the closure 
(m + )* of m + is not the smallest DBM — with respect to the 
^ order — that represents the octagon 2? + (m + ). The problem 
is that the set of implicit constraints gathered by summation 
of constraints over paths of C/(m + ) is not sufficient. Indeed, 



we would like to deduce (vf — v- < (c + d)/2) from 



(4 



< c) and (v 



< d), which is not possible 



because the set of edges {(v~ 7 vf), (v~ ,v^)} does not form 
a path (Figure [9}. 

Here is a more formal description of a normal form, called 
the strong closure, adapted from the closure: 

Definition 1: m + is strongly closed if and only if 

• m + is coherent: mj- = mt; 

• m + is closed: Vi, = and \/i,j,k, < m^, + 

. Vi,j, m+ < (m+ +m+)/2. 

□ 

From this definition, we derive the strong closure algorithm 
m + i ► (m + )* described in Figure [8] The algorithm looks a 
bit like the closure algorithm of Figure [7] and also runs in 
0(N 3 ) time. It uses two auxiliary functions C£ and S + . The 
Ct function looks like the C k function used in the closure al- 
gorithm except it is designed to maintain coherence; each 
application is a step toward closure. The S + function ensures 
that Vi,i, [5+(m+)] ?J < ([S+(m+)]„ + [S+(m+)] B )/2 
while maintaining coherence. 

There is no simple explanation for the complexity of C^; 
the five terms in the min statement appear naturally when 
trying to prove that, when interleaving and S + steps, what 
was gained before will not be destroyed in the next step. 

The following theorem holds for I ^ 7L: 



Theorem 4: 

1) m = (m+)' <= 

2) if (m+)v 
V(m + ) such that Vfc 
(m + )* ? (Saturation). 



m + is strongly closed. 
^ +oo, then 3(s , . . . , s 2 n-i) € 



S2k 



s 2k +i and sj 



3) 



(m -1 
Form). 



inf^{n+ | P+(n+) = 2?+(m + )} (Normal 



□ 

This theorem is very similar to Theorem [3] It states that, 
when I ^ Z, the strong closure algorithm gives a strongly 
closed DBM (Theorem 2]l) which is indeed a normal form 
(Theorem |4] 3). The nice saturation property of Theorem |4]2 
is useful to analyze the projection and union operators. 

D. Discussions about Z. 

Classical DBMs and the interval constraint extension work 
equally well on reals, rationals and integers. However, our 
extension does not handle integers properly. 

When 1 = 2, the strong closure algorithm does not lead to 
the smallest DBM with the same V + -domain. For example, 
knowing that x is an integer, the constraint 2x < 2c should be 
deduced from 2x < 2c+l, which the strong closure algorithm 
fails to do. More formally, Definition [T] is not sufficient; 
our normal form should also respect: Vi, mt is even. One 
can imagine to simply add to the strong closure algorithm a 
rounding phase R + defined by [R + (m + )] il: = 2[m.^/2\ and 
[i? + (m + )]-- = m^t- if i ^ j, but it is tricky to make B + 
2 interact correctly so we obtain a DBM which is both 
closed and rounded. We were unable, at the time of writing, 
to design such an algorithm and keep a 0(N 3 ) time cost. 
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m 
m 



o 
+ 

fc+i 



A 
A 
A 



m 1 



S+(C+ (m+)) Vfc, < k < N, 



(m+)* = m 



AT' 



where C^" is defined, Vfc, by: 



[C+(n+)].. ^ min( n+ (n++n+) 



and S + is defined by: 

A 



■nj)) 



[S + (n+)L = min(n+ (n+ + n+)/2 ) 



build an analyzer. If our abstract numerical domain is used 
in a more complex analysis or in a parameterized abstract 
domain (backward and interprocedural analysis, such as in 
Bourdoncle's Syntox analyzer, Deutsch's pointer analysis 
[14], etc.), one may need to add some more operators. 

All the operators and transfer functions presented in this 
section obviously respect coherence and are adapted from our 
PADO-II article [1]. 

A. Equality and Inclusion Testing. 

We distinguish two cases. If one or both V + -domains are 
empty, then the test is obvious. If none are empty, we use the 
following theorem which relies on the properties of the strong 
closure: 



Theorem 5: 

1) £>+(m+ 

2) r>+(m+ 



C D+(n+) 
= P+(n+) 



(m+)* «3 n+; 
(m+)* = (n+ 



Fig. 8. Strong Closure algorithm. 



□ 




Fig. 9. A DBM (a) and its strong closure (b). Note that (a) is closed, and 
that (a) and (b) have the same V + -domain but not the same V-domain. In 
(b), we deduced (vq + v i < 3) from (2urj < 2) and (2vi < 4), so it is 
smaller than (a) with respect to ^. 



This problem was addressed by Harvey and Stuckey in 
their ACSC'97 article [9]. They propose a satisfiability al- 
gorithm mixing closure and tightening steps that can be 
used to test emptiness and build the normal form (m + )* = 
inf^{n + | 2? + (n + ) = £> + (m + )} we need. Unfortunately, 
this algorithm has a 0(N 4 ) time cost in the worst case. This 
algorithm has the advantage of being incremental — 0(N 2 ) 
time cost per constraint changed in the DBM — which is useful 
for CLP problems but does not seem interesting in static 
analysis because many operators are point-wise and change 
all (2N) 2 constraints in a DBM at once. 

In practice, we suggest to analyze integer variables in Q or 
K, as it is commonly done for polyhedron analysis [6]. This 
method will add noise solutions, which is safe in the abstract 
interpretation framework because we are only interested in an 
upper approximation of program behaviors. 

VI. Operators and Transfer Functions 

In this section, we describe how to implement the abstract 
operators and transfer functions needed for static analysis. 

These are the generic ones described in [5] for the interval 
domain, and in [6] for the polyhedron domain: assignments, 
tests, control flow junctions and loops. See Section VIII 
for an insight on how to use theses operators to actually 



B. Projection. 

Thanks to the saturation property of the strong closure, we 
can easily extract from a DBM m + representing a non-empty 
octagon, the interval in which a variable vi ranges : 

Theorem 6: 

{ t | 3(s , ■ • ■ , sjv-i) S £> + (m + ) such that s l = t } 
= Mm + )^ 2l+1 /2, (m+)5 i+iai /2] 
(interval bounds are included only if finite). 

□ 



C. Union and Intersection. 

The max and min operators on I lead to point-wise least 
upper bound V and greatest lower bound A (with respect to 
the ^ order) operators on DBMs: 

[m+An+] y - = min(mj,n+-); 
[m + Vn + ] ^. = max(m+ , n^) . 

These operators are useful to compute intersections and 
unions of octagons: 

Theorem 7: 

1) D+(m+An+)=I)+(m+)nD + (n+). 

2) P+(m+Vn+)3D+(m+)UD + (n+). 

3) If m + and n + represent non-empty octagons, then: 

((m+)-)V((n+)') = 

inf^{o+ | V+(o+) D X>+(m+) UP+(n+)}. 

□ 

Remark that the intersection is always exact, but the union 
of two octagons is not always an octagon, so we compute 
an upper approximation. In order to get the best — smallest — 
approximation for the union, we need to use the strong closure 
algorithm, as stated in Theorem [7] 3. 

Another consequence of Theorem |7]3 is that if the two 
arguments of V are strongly closed, then the result is also 
strongly closed. Dually, the arguments of A do not need to 
be strongly closed in order to get the best precision, but the 
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result is seldom strongly closed — even if the arguments are. 
This situation is similar to what is described in our PADO- 
II article [1]. Shaham, Kolodner, and Sagiv fail to analyze 
this result in their CC2000 article [12] and perform a useless 
closure after the union operator. 

D. Widening. 

Program semantics often use fixpoints to model arbitrary 
long computations such as loops. Fixpoints are not computable 
in the octagon domain — as it is often the case for abstract 
domains — because it is of infinite height. Thus, we define 
a widening operator, as introduced in PCousot's thesis [17, 
§4.1.2.0.4], to compute iteratively an upper approximation 
of the least fixpoint V - eN F t (m + ) greater than m+ of an 
operator F: 



lm+Vn+1 



A 



m 



'j 
oo 



if n+ < m+, 
elsewhere . 



The idea behind this widening is to remove in m + the 
constraints that are not stable by union with n + ; thus it is 
very similar to the standard widenings used on the domains 
of intervals [5] and polyhedra [6]. [12] proposes a similar 
widening on the set of DBMs representing V-domains. 

The following theorem proves that V is a widening in the 
octagon domain: 

Theorem 8: 

1) X>+(m+Vn+) D P+(m+) UP+(n+). 

2) For all chains (n+)i e N, the chain defined by induction: 



A 



(n + )* 

m+ 1 V((n+)-) 



if i = 0, 
elsewhere, 



is increasing, ultimately stationary, and with a limit 
greater than \/ ieN (n+)'. 

a 

As for the union operator, the precision of the V operator is 
improved if its right argument is strongly closed; this is why 
we ensure the strong closure of when computing mf in 
Theorem [8] 2. 

One can be tempted to force the strong closure of the left 
argument of the widening by replacing the induction step in 
Theorem|8]2 by: m+ = (m+ 1 V((n+)'))' if i > 0. However, 
we cannot do this safely as Theorem [8]2 is no longer valid: 
one can build a strictly increasing infinite chain (m^)igN (see 
Figure ITOb which means that fixpoints using this induction 
may not be computable! This situation is similar to what is 
described in our PADO-II article [1]. Shaham, Kolodner, and 
Sagiv fail to analyze this problem in their CC2000 article 
[12] and pretend all their computation are performed with 
closed DBMs. If we want our analysis to terminate, it is 
very important not to close the (mf)i^ in the induction 
computation. 

E. Guard and Assignment. 

In order to analyze programs, we need to model the effect 
of tests and assignments. 



nA = 




2i+l m 




2i+l 



Fig. 10. Example of an infinite strictly increasing chain defined by rrig = 
(n^~)*, mt = (mT^ v((n^)*)*. Remark that the nodes {v^ , , } 
are not represented here due to lack of space; this part of the DBMs can be 
easily figured out by coherence. 



Given a DBM m + that represents a set of possible values of 
the variables V + at a program point, an arithmetic comparison 
g, a variable Vi G V + , and an arithmetic expression e, we 
^ n u e) DBMs representing respectively 



denote by m, n and m, 

the set of possible values of V + if the test g succeeds and 
after the assignment Vi <— e(v , . . . ,vn-i)- Since the exact 
representation of the resulting set is, in general, impossible, 
we will only try to compute an upper approximation: 



Property 1: 

1) 2?+(m+ ) ) D{se V+(m+) \ s satisfies g}. 

2) P+(m+^ e) ) D {s[ Sz - e(s)] | s E P+(m+)} 

(where s[si <— x] means s with its 
changed into x). 

Here is an example definition: 
Definition 2: 
1) 



fi 1 component 



□ 



A 



(v k +vi<c)\ .. 

min(m+,c) if G {(2ft, 21 + 1); (2/, 2k + 1)}, 



elsewhere, 
and 



and similarly for ml 

J (v k -vi<c) (—Vk—vi<c) 



2) m 



K<c) 
-i- 

m 



A 



3) m 



(v k +vi=c) 



and similarly for 



if . „ s, and 

[v k +v k <2c)' 

mf s o \ ■ 

{—Vk—Vk<—2c) 

( m K+l'!<c))(-'»fc- 



«l<-c). 



(yi,—vi=c) 



4) 



4 4. ^ 

(Vk*— Vk+C) 



and 



A 



A 



~. + (otij + /3ij)c, with 

if j = 2k, 
if j = 2k + l, 
elsewhere , 



if i = 2k, 
if i = 2k H 
elsewhere 



1, 



5) 



if 

(Vk* 



A 



-vt+c) 
C 

(m+)' 

+00 



ij 



ij 



if(j,i)e{(2k,2l);(2l- 
if e {(2Z,2fc); (2fc 
if i,j g {2fc,2fc+l}, 
elsewhere, 



for fc / L 



6) 



In all other cases, we simply choose: 

4- A 



A. Coherent DBMs Lattice. 

l,2fc+ 1)}, The set M + of coherent DBMs, together with the order 
- 1, 21 + \y\relation ^ and the point-wise least upper bound V and greatest 
lower bound A, is almost a lattice. It only needs a least element 
_L, so we extend §3, V and A to M\_ = M + U {a} in an 
obvious way to get U and IT The greatest element T is the 
DBM with all its coefficients equal to +00. 



(9) 



A 



(m 

+00 



+ \ 



if z, j ^ {2fc,2fc + 1}, 
elsewhere . 



□ 

Remark that the assignment destroys informations about 
Vk and this could result in some implicit constraints about 
other variables being destroyed as well. To avoid precision 
degradation, we use constraints from the strongly closed form 
(m + )*j in Definitions |2]5 and[2]6. 

Remark also that the guard and assignment transfer func- 
tions are exact, except in the last — general — case of Definition 
[2] There exists certainly many ways to improve the precision 
of Definition [2]6. For example, in order to handle arbitrary 
assignment <— e, one can use the projection operator 
to extract the interval where the variables range, then use 
a simple interval arithmetic to compute an approximation 
interval [— e~/2,e + /2] where ranges the result 

[-e-,e+]De( [-(m+)5 ls (m+)U ... ^ 

[— (m )*at_2 2AT-1) ( m )*AT-1 2JV-2J ) 

and put back this information into m + : 

if i,j i {2fc,2fc+l}, 
if (i,j) = (2k + l,2k), 



m 



(>;,. 



A 



(m + )V 



if (i,j) = (2k,2k + l), 
elsewhere . 



Finally, remark that we can extend easily the guard operator 
to boolean formulas with the following definition: 



m 



Definition 3: 
1) 
2) 
3) 



+ 

(31 and g 2 ) 



A 



ASi) 



1+ 

(92 



(Si or g 2 ) 



((<))*) v ((»£.))"): 



m 



(-91) 



is settled by the classical transformation: 



and g 2 ) 

igi or g 2 ) 



VII. 



(-51) 
(-51) 



or (-152) 
and (^32) 



□ 



Lattice Structures 

In this section, we design two lattice structures: one on the 
set of coherent DBMs and one on the set of strongly closed 
DBMs. The first one is useful to analyze fixpoint transfers 
between abstract and concrete semantics, and the second one 
allows us to design a meaning function — or even a Galois 
connection — linking the set of octagons to the concrete lattice 
■p(V + 1 — ► I), following the abstract interpretation framework 
described in Cousot and Cousot's POPL'79 article [4]. 

Lattice structures and Galois connections can be used to 
simplify proofs of correctness of static analyses — see, for 
example, the author's Master thesis [2] for a proof of the 
correctness of the analysis described in Section VIII. 



Theorem 9: 

1) {M\, C, n, U, J_, T) is a lattice. 

2) This lattice is complete if (I, <) is complete (I 
E, but not O). 



□ 
First, 

I) 



There are, however, two problems with this lattice, 
this lattice is not isomorphic to a sub-lattice of V(V + 
as two different DBMs can have the same V + -domain. Then, 
the least upper bound operator U is not the most precise upper 
approximation of the union of two octagons because we do 
not force the arguments to be strongly closed. 

B. Strongly Closed DBMs Lattice. 

To overcome these difficulties, we build another lattice, 
based on strongly closed DBMs. First, consider the set M'^ 
of strongly closed DBMs M.', with a least element _L* added. 
Now, we define a greatest element T", a partial order relation 
C*, a least upper bound U* and a greatest lower bound PI* in 
Ai^ as follows: 



T'- = 

1 1. 1 — 



m+ □* n 



m+ i_r 







if i = j, 
elsewhere, 



"iTn 
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_L*, 

f + 



m + ^ n^ 



if n+ = ±*, 
if m+ = _L*, 
elsewhere, 

if _L" <E {m + ,n + } or 
2?+(m+ An+) =0, 
elsewhere . 



Thanks to Theorem [5]2, every non-empty octagon has a 
unique representation in M*\ A* is the representation for 
the empty set. We build a meaning function 7 which is an 
extension of • £> + (-) to M*^. 



7 (m+) 



A 



if m = a', 
elsewhere . 



Theorem 10: 

1) (■M^,E , ,n*,U , ,_L , ,T*) is a lattice and 7 is one-to- 



one. 



2) 



If (I) <) is complete, this lattice is complete and 7 is 
meet-preserving: 7(|~T X) = Clijix) | x £ X}. We 
can — according to Cousot and Cousot [18, Prop. 7] — 
build a canonical Galois insertion: 



m: 
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where the abstraction function a is defined by: 
a (X) = rf { x G Ml | X C 7 (x) } . 

□ 

The lattice features a nice meaning function and a 

precise union approximation; thus, it is tempting to force all 
our operators and transfer functions to live in Ai* ± by forcing 
strong closure on their result. However, we saw this does 
not work for the widening, so fixpoint computations must be 
performed in the A4~\_ lattice. 

VIII. Application to Program Analysis 

In this section, we present the program analysis based on 
our new domain that enabled us to prove the correctness of 
the program in Figure Q] 

This is only one example application of our domain for 
program analysis purpose. It was chosen for its simplicity 
of presentation and implementation. A fully featured tool 
that can deal with real-life programs, taking care of pointers, 
procedures and objects is far beyond the scope of this work. 
However, current tools using the interval or the polyhedron 
domains could benefit from this new abstract domain. 

A. Presentation of the Analysis. 

Our analyzer is very similar to the one described in Cousot 
and Halbwachs's POPL'78 article [6], except it uses our new 
abstract domain instead of the abstract domain of polyhedra. 

Here is a sketched description of this analysis — more infor- 
mations, as well as proofs of its correctness can be found in 
the author's Master thesis [2]. 

We suppose that our program is procedure-free, has only 
numerical variables — no pointers or array — and is solely com- 
posed of assignments, if then else fl and while do done 
statements. Syntactic program locations are placed to vi- 
sualize the control flow: there are locations before and after 
statements, at the beginning and the end of then and else 
branches and inner loop blocks; the location at the program 
entry point is denoted by Iq. 

The analyzer associates to each program point li an element 
m+ G M.\. At the beginning, all mf are _L (meaning 
the control flow cannot pass there) except mj = T. Then, 
informations are propagated through the control flow as if the 
program were executed: 

. For l(k) Vi <- e {h+i)j, we set mf +1 = (m+) (t ,.^ e) . 

• For a test f(k) if g then (k+i) ■ ■ ■ else (lj) ■ ■ •], we 
set m+ hl = (m+) (9) and m+ = (m+) ( ^ 9) . 

• When the control flow merges after a test [[then • ■ • (li) 
else ••• (lj) R we set m+ +1 = ((m+)') U 

(K)'). 

. For a loop [ (k) while g do (lj) ■ ■ ■ (l k ) done (h+i)j, 
we must solve the relation m+ = (m+ U m^)^)- We 
solve it iteratively using the widening: suppose m+ is 
known and we can deduce a from any m+ by 
propagation; we compute the limit m+ of 

f m+ = (m+) (g) 

I m+ n+1 =m+ n V((m+J« fl) ) 



then mjt" is computed by propagation of m+ and we set 

<i = (K + )(. 9) )u(K)^) 

At the end of this process, each m+ is a valid invariant that 
holds at program location ij. This method is called abstract 
execution. 

B. Practical Results. 

The analysis described above has been implemented in 
OCaml and used on a small set of rather simple algorithms. 

Figure QT| shows the detailed computation for the lines 5-9 
from Figure Q] Remark that the program has been adapted to 
the language described in the previous section, and program 
locations Iq,. . . ,lg have been added. Also, for the sake of 
brevity, DBMs are presented in equivalent constraint set form, 
and only the useful constraints are shown. Thanks to the 
widening, the fixpoint is reached after only two iterations: 
invariants m^ k=2 8 only hold in the first iteration of 
the loop (i = 1); invariants fc=2 8 hold for all loop 

iterations (1 < i < m). At the end of the analysis, we have 
(— m < a < m) G (m^)*. 

Our analyzer was also able to prove that the well-known 
Bubble sort and Heap sort do not perform out-of-bound error 
while accessing array elements and to prove that Lamport's 
Bakery algorithm [19] for synchronizing two processes is 
correct — however, unlike the example in Figure [TJ these anal- 
ysis where already in the range of our PADO-II article [1]. 

C. Precision and Cost. 

The computation speed in our abstract domain is limited 
by the cost of the strong closure algorithm because it is the 
most used and the most costly algorithm. Thus, most abstract 
operators have a 0(N 3 ) worst case time cost. Because a fully 
featured tool using our domain is not yet available, we do not 
know how well this analysis scales up to large programs. 

The invariants computed are always more precise than the 
ones computed in [1], which gives itself always better results 
than the widespread intervals domain [5]; but they are less 
precise than the costly polyhedron analysis [6]. Possible loss of 
precision have three causes: non-exact union, non-exact guard 
and assignment transfer functions, and widening in loops. The 
first two causes can be worked out by refining Definition |2] and 
choosing to represent, as abstract state, any finite union of 
octagons instead of a single one. Promising representations 
are the Clock-Difference Diagrams (introduced in 1999 by 
Larsen, Weise, Yi, and Pearson [20]) and Difference Decision 
Diagrams (introduced in M0ller, Lichtenberg, Andersen, and 
Hulgaard's CSL'99 paper [21]), which are tree-based structures 
introduced by the model-checking community to efficiently 
represent finite unions of V°-domains, but they need adaptation 
in order to be used in the abstract interpretation framework and 
must be extended to octagons. 

IX. Conclusion 

In this article, we presented a new numerical abstract 
domain that extends, without much performance degradation, 
the DBM-based abstract domain described in our PADO-II 
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4 (Z ) a «- 0; i <- 1 (h) 
while i < m do (Z2) 

7 if? 

8 then (Z 3 ) a <- a + 1 (Z 4 ) 

9 else (l 5 ) a <— a — 1 (Z 6 ) 
fi(i 7 ) 

i ^ i + 1 (Z 8 ) 
1 1 done (Zg) 





= {i = l; a = 0; 1 — i < a < i — 1} 



m (T = T 



FZrsf iteration of the loop 

mj = {i = 1; a = 0; 1 — i < a < i — 1; i < to} 
m ^o = m ^o = m t.o 

m^Q = {£ = 1; a = 1; 2 — i < a < i; i < to} 

m^Q = {£ = 1; a = —1; —i<a<i — 2; i < to} 

m^Q = {i = 1; a G [—1, 1]; — i < a < i; i < m} 

m^ = {?, = 2; a e [-1, 1]; 1 — i<a<i — 1; i < to + 1} 

Second iteration of the loop 

m ii = m ii = m ii = m t,o v (m+ ) (4 < m) 

= {1 < i < m ; 1 - i < a < i - 1} 
m.4 j = {1 < i < to; 2 — i < a < i} 
m 6~i = {1 — * — m i — « < a < « — 2} 
1117! = {1 < i < to; — i < a < i} 
■f tl = {2 < i < m+ 1; 1 - i < a < i - 1} 



m 



77i/raf iteration of the loop 

mj 2 = (fixpoint reached) 



m 2 — 1112,1 "i 8 — "^.l 

m g" = {i = m+1; 1 — i < a < i — 1} 



Fig. 11. Detailed analysis of lines 5-9 from Figure[T] For sake of conciseness 
DBMs are shown in their equivalent constraint set form and useless constraints 
are not shown. 

article [1]. This domain allows us to manipulate invariants of 
the form (±x ± y < c) with a 0(n 2 ) worst case memory 
cost per abstract state and a 0(n 3 ) worst case time cost per 
abstract operation — where n is the number of variables in the 
program. 

We claim that our approach is fruitful since it allowed 
us to prove automatically the correctness of some non-trivial 
algorithms, beyond the scope of interval analysis, for a much 
smaller cost than polyhedron analysis. However, our prototype 
implementation did not allow us to test our domain on real- 
life programs and we still do not know if it will scale up. It 
is the author's hope that this new domain will be integrated 
into currently existing static analyzers as an alternative to the 
intervals and polyhedra domains. 
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